Goto

Collaborating Authors

 data strategy


Secure, Scalable and Privacy Aware Data Strategy in Cloud

Butte, Vijay Kumar, Butte, Sujata

arXiv.org Artificial Intelligence

The enterprises today are faced with the tough challenge of processing, storing large amounts of data in a secure, scalable manner and enabling decision makers to make quick, informed data driven decisions. This paper addresses this challenge and develops an effective enterprise data strategy in the cloud. Various components of an effective data strategy are discussed and architectures addressing security, scalability and privacy aspects are provided.


On Synthetic Data Strategies for Domain-Specific Generative Retrieval

Wen, Haoyang, Guo, Jiang, Zhang, Yi, Jiang, Jiarong, Wang, Zhiguo

arXiv.org Artificial Intelligence

This paper investigates synthetic data generation strategies in developing generative retrieval models for domain-specific corpora, thereby addressing the scalability challenges inherent in manually annotating in-domain queries. We study the data strategies for a two-stage training framework: in the first stage, which focuses on learning to decode document identifiers from queries, we investigate LLM-generated queries across multiple granularity (e.g. chunks, sentences) and domain-relevant search constraints that can better capture nuanced relevancy signals. In the second stage, which aims to refine document ranking through preference learning, we explore the strategies for mining hard negatives based on the initial model's predictions. Experiments on public datasets over diverse domains demonstrate the effectiveness of our synthetic data generation and hard negative sampling approach.


Data strategies for AI leaders

MIT Technology Review

The expectation that generative AI could fundamentally upend business models and product offerings is driven by the technology's power to unlock vast amounts of data that were previously inaccessible. "Eighty to 90% of the world's data is unstructured," says Baris Gultekin, head of AI at AI data cloud company Snowflake. "But what's exciting is that AI is opening the door for organizations to gain insights from this data that they simply couldn't before." In a poll conducted by MIT Technology Review Insights, global executives were asked about the value they hoped to derive from generative AI. Many say they are prioritizing the technology's ability to increase efficiency and productivity (72%), increase market competitiveness (55%), and drive better products and services (47%).


AI security and cyber risk in IoT systems

Radanliev, Petar, De Roure, David, Maple, Carsten, Nurse, Jason R. C., Nicolescu, Razvan, Ani, Uchenna

arXiv.org Artificial Intelligence

However, this extensive integration of IoT devices has also introduced significant cybersecurity risks. The Internet of Things (IoT) has attracted the attention of cybersecurity professionals after cyber-attackers started using IoT devices as botnets (Palekar and Radhika 2022). IoT devices are often vulnerable to various cyber threats, including distributed denial-of-service (DDoS) attacks, botnet exploitation, and data breaches, all of which can compromise critical systems' integrity, confidentiality, and availability. Understanding and mitigating the risks associated with IoT deployments is crucial in this evolving landscape, especially given the interdependencies between IoT components and systems.


360Zhinao Technical Report

360Zhinao Team, null

arXiv.org Artificial Intelligence

For rapid development in pretraining, we establish a stable and sensitive ablation environment to evaluate and compare experiment runs with minimal model size. We also mainly emphasize data during alignment, where we strive to balance quantity and quality with filtering and reformatting. With tailored data, 360Zhinao-7B's context window is easily extended to 32K and 360K. RMs and RLHF are trained following SFT and credibly applied to specific tasks. All together these contributions lead to 360Zhinao-7B's competitive performance among models of similar size. In recent years, the field of natural language processing (NLP) has witnessed a profound transformation, fueled by the advent of large language models (LLMs) (Bubeck et al., 2023; Touvron et al., 2023a; OpenAI, 2023), which have emerged as a cornerstone to revolutionize the way we understand and generate human language. LLMs represent a new paradigm in artificial intelligence (AI) research, characterized by their immense scale, complexity, and versatility (Zhao et al., 2023). Those models, typically built upon advanced neural network architectures like Transformers, are trained on vast amounts of text data, encompassing billions or even trillions of words. The extensive training endows LLMs with a deep understanding of linguistic structures, nuances, and context, enabling them to generate human-like text and perform a myriad of NLP tasks with unprecedented accuracy and fluency (Yang et al., 2024). Despite the impressive capabilities of LLMs, training an LLM from scratch still struggles with several challenges. The training journey can be divided into two stages: the pretraining stage and the alignment stage (Zhang et al., 2023). The pretraining stage involves the model learning on largescale textual data to build its foundational knowledge and language comprehension. However, two obstacles stick out in the pretraining stage (Zhao et al., 2023). First, refining the training corpus to enhance the base model's performance is paramount given the enormity of pretraining data. While extensive research has delved into data cleaning and sampling methodologies (Soldaini et al., 2024; Penedo et al., 2023; Wenzek et al., 2019; Gunasekar et al., 2023), the sheer scale and intricacy of pretraining datasets still leave ample room for elevating informational density and efficiency. Second, establishing a stable and sensitive ablation environment for accurately assessing data strategies poses another challenge (Chang et al., 2024; Zhou et al., 2023).


Data at the center of business

MIT Technology Review

With more than 5,000 branches across 48 states and 80 million customers, each with its own unique requirements to satisfy its customers' financial needs, a clear data strategy is key for JPMorgan Chase. According to Mark Birkhead, firm-wide chief data officer at JPMorgan Chase, data analytics is the oxygen that breathes life into the firm to deliver growth and improve the customer experience. Providing first-class business in a first-class way for clients and customers applies to every part of the firm, including its heavy investments in data analytics, machine learning, and AI. Using these advanced technologies, JPMorgan Chase can gain a deeper understanding of the breadth and specificity of the needs of the customers and communities it serves. "It means using our data to drive positive outcomes for our customers and our clients and our business partners. And it means using this to actually help our customers and clients manage their daily lives in a better, simpler way," says Birkhead.


Product Owner, Data Strategy at NBCUniversal - Englewood Cliffs, New Jersey, United States

#artificialintelligence

NBCUniversal owns and operates over 20 different businesses across 30 countries including a valuable portfolio of news and entertainment television networks, a premier motion picture company, significant television production operations, a leading television stations group, world-renowned theme parks and a premium ad-supported streaming service. Here you can be your authentic self. As a company uniquely positioned to educate, entertain and empower through our platforms, Comcast NBCUniversal stands for including everyone. We strive to foster a diverse and inclusive culture where our employees feel supported, embraced and heard. We believe that our workforce should represent the communities we live in, so that together, we can continue to create and deliver content that reflects the current and ever-changing face of the world.


Head, Data Analytics at Standard Bank Group - Douglas, Isle of Man

#artificialintelligence

To translate the Standard Bank Group (SBG) data vision and strategy into applicable data strategies in Standard Bank Offshore to support SBG objectives. To oversee the implementation of the data strategy by co-ordinating and facilitating data programmes to enable data driven business decisions that are consistent and effective. To enforce governance and compliance ensuring alignment to SBG framework, policies, and standards. To Provide the business leadership role that has the primary enterprise accountability for value creation by means of the organization's data and analytics assets, as well as the data and analytics ecosystem. To define, develop, and execute the data monetisation strategy by providing guidance, input and leadership across the data, analytics using AI, ML, and advanced data science methodologies.


Should data analysts worry about ChatGPT? - TechNative

#artificialintelligence

Is conversational AI a blessing or curse for data? If you follow the tech industry, you’ve heard about ChatGPT. Whether you think it’s the future of chatbot technology or you’re erring on the side of caution, if you know about it, you’re bound to have an opinion. As Google confirms it’s launching a rivalling service, interacting with AI will soon become commonplace in our personal and professional lives. But what does that mean for data and analytics? Here, Jonathan Hedger, co-founder of the UK’s only data jobs board, Only Data Jobs, explores. Launched late in 2022, ChatGPT has quickly become


Senior Manager, Data Strategy at Zeta Global - Remote - United States

#artificialintelligence

Zeta is seeking an experienced, mid-level Strategist to join our growing Data Cloud group in response to rapid client acquisition and the need for additional services support. This is a key role, providing insight and guidance on paid media and CRM activation strategies, based on Zeta's proprietary data assets, to Director and C-Level executives at clients across all verticals. Our Data Cloud group takes an innovative approach to help marketers identify insights and size opportunities, develop audiences/segments for targeting, construct media mix plans, partner with creative and messaging as well as test and measure the incremental impact of campaigns. The role will help develop and refine the go-to-market approach for Zeta's business units as well as partner with client services and sales teams to pitch and scope prospect opportunities. In addition, the Director, Data Cloud Strategy will work closely work with internal stakeholders across operations, product, engineering and marketing teams to influence product roadmap as well as support the development of marketing and sales content.